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Attachment B 

In the Claims: 

1 . (original )A method of determining cluster attractors for a plurality of 

5 documents, each document comprising at least one term, the method comprising: 
calculating, in respect of each term, a probability distribution indicative of the 
frequency of occurrence of the, or each, other term that co-occurs with said term 
in at least one of said documents; calculating, in respect of each term, the entropy 
of the respective probability distribution; selecting at least one of said probability 
1 0 distributions as a cluster attractor depending on the respective entropy value. 

2. (original) A method as claimed in Claim 1, wherein each probability 
distribution comprises, in respect of each co-occurring term, an indicator that is 
indicative of the total number of instances of the respective co-occurring term in 

15 all of the documents in which the respective co-occurring term co-occurs with the 
term in respect of which the probability distribution is calculated. 

3. (currently amended) A method as claimed in Claim 1 or 2 , claim 1 , wherein 
each probability distribution comprises, in respect of each co-occurring term, an 

2 0 indicator comprising a conditional probability of the occurrence of the respective 
co-occurring term in a document given the appearance in said document of the 
term in respect of which the probability distribution is calculated. 

4. (currently amended) A method as claimed in any on e of Claims 1 to 3 , claim 1 , 
2 5 wherein each indicator is normalized with respect to the total number of terms in 

the, or each, document in which the term in respect of which the probability 
distribution is calculated appears. 

30 5. (original) A method as claimed in Claim 1, comprising assigning each term to 
one of a plurality of subsets of terms depending on the frequency of occurrence of 
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the term; and selecting, as a cluster attractor, the respective probability 
distribution of one or more terms from each subset of terms. 

6. (original ) A method as claimed in Claim 5, wherein each term is assigned to a 
subset depending on the number documents of the corpus in which the respective 

5 term appears. 

7. (currently amended) A method as claimed in Claim 5 or 6, claim 5, wherein an 
entropy threshold is assigned to each subset, the method comprising selecting, as a 
cluster attractor, the respective probability distribution of one or more terms from 

1 0 each subset having an entropy that satisfies the respective entropy threshold. 

8. (original ) A method as claimed in Claim 7, comprising selecting, as a cluster 
attractor, the respective probability distribution of one or more terms from each 
subset having an entropy that is less than or equal to the respective entropy 

15 threshold. 

9. (currently amended) A method as claimed in any one ofClaimsSto 8 , claim 5 , 
wherein each subset is associated with a frequency range and wherein the 
frequency ranges for respective subsets are disjoint. 

20 

10. (currently amended) A method as claimed in any one ofClaims5to 9 , claim 5 . 
wherein each subset is associated with a frequency range, the size of each 
successive frequency range being equal to a constant multiplied by the size of the 
preceding frequency range in order of increasing frequency. 

25 

1 1 . (currently amended) A method as claimed in any one of Claims 7 to 10 , claim 
7, wherein the respective entropy threshold increases for successive subsets in 
order of increasing frequency. 

30 12. (original) A method as claimed in Claim 11, wherein the respective entropy 
threshold for successive subsets increases linearly. 
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13. (original) A computer program product comprising computer program code 
for causing a computer to perform the method of Claim 1 . 

5 14. (original ) An apparatus for determining cluster attractors for a plurality of 
documents, each document comprising at least one term, the apparatus 
comprising: means for calculating, in respect of each term, a probability 
distribution indicative of the frequency of occurrence of the, or each, other term 
that co-occurs with said term in at least one of said documents; means for 
1 0 calculating, in respect of each term, the entropy of the respective probability 

distribution; and means for selecting at least one of said probability distributions 
as a cluster attractor depending on the respective entropy value. 

15. (original ) A method of clustering a plurality of documents, each document 

1 5 comprising at least one term, the method comprising determining cluster attractors 
in accordance with Claim 1. 

16. (original ) A method as claimed in Claim 15, comprising: calculating, in 
respect of each document, a probability distribution indicative of the frequency of 

2 0 occurrence of each term in the document; comparing the respective probability 

distribution of each document with each probability distribution selected as a 
cluster attractor; and assigning each document to at least one cluster depending on 
the similarity between the compared probability distributions. 

25 17. (original) A method as claimed in Claim 16, comprising organising the 
documents within each cluster by: assigning a respective weight to each 
document, the value of the weight depending on the similarity between the 
probability distribution of the document and the probability distribution of the 
cluster attractor; comparing the respective probability distribution of each 

3 0 document in the cluster with the probability distribution of each other document in 

the cluster; assigning a respective weight to each pair of compared documents, the 
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value of the weight depending on the similarity between the compared respective 
probability distributions of each document of the pair; calculating a minimum 
spanning tree for the cluster based on the respective calculated weights. 

5 18. (original) A computer program product comprising computer program code 
for causing a computer to perform the method of Claim 15. 



